Skip to main content

Prediction

Predictions​

Predictor Classes​

Decoding Strategies​

eole.predict.greedy_search.sample_with_temperature(logits, temperature, top_k, top_p)​

Select next tokens randomly from the top k possible next tokens.

Samples from a categorical distribution over the top_k words using the category probabilities logits / temperature.

  • Parameters:
    • logits (FloatTensor) – Shaped (batch_size, vocab_size). These can be logits ((-inf, inf)) or log-probs ((-inf, 0]). (The distribution actually uses the log-probabilities logits - logits.logsumexp(-1), which equals the logits if they are log-probabilities summing to 1.)
    • temperature (float) – Used to scale down logits. The higher the value, the more likely it is that a non-max word will be sampled.
    • top_k (int) – This many words could potentially be chosen. The other logits are set to have probability 0.
    • top_p (float) – Keep most likely words until the cumulated probability is greater than p. If used with top_k: both conditions will be applied
  • Returns:
    • topk_ids: Shaped (batch_size, 1). These are the sampled word indices in the output vocab.
    • topk_scores: Shaped (batch_size, 1). These are essentially (logits / temperature)[topk_ids].
  • Return type: (LongTensor, FloatTensor)

Scoring​

class eole.predict.penalties.PenaltyBuilder(cov_pen, length_pen)​

Bases: object

Returns the Length and Coverage Penalty function for Beam Search.

  • Parameters:
    • length_pen (str) – option name of length pen
    • cov_pen (str) – option name of cov pen
  • Variables:
    • has_cov_pen (bool) – Whether coverage penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting beta to 0 should force coverage length to be a no-op.
    • has_len_pen (bool) – Whether length penalty is None (applying it is a no-op). Note that the converse isn’t true. Setting alpha to 1 should force length penalty to be a no-op.
    • coverage_penalty (callable [ *[*FloatTensor , float ] , FloatTensor ]) – Calculates the coverage penalty.
    • length_penalty (callable [ *[*int , float ] , float ]) – Calculates the length penalty.

coverage_none(cov, beta=0.0)​

Returns zero as penalty

coverage_summary(cov, beta=0.0)​

Our summary penalty.

coverage_wu(cov, beta=0.0)​

GNMT coverage re-ranking score.

See β€œGoogle’s Neural Machine Translation System” []. cov is expected to be sized (*, seq_len), where * is probably batch_size x beam_size but could be several dimensions like (batch_size, beam_size). If cov is attention, then the seq_len axis probably sums to (almost) 1.

length_average(cur_len, alpha=1.0)​

Returns the current sequence length.

length_none(cur_len, alpha=0.0)​

Returns unmodified scores.

length_wu(cur_len, alpha=0.0)​

GNMT length re-ranking score.

See β€œGoogle’s Neural Machine Translation System” [].